Skip to content

Session refactor #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 4, 2017
Merged

Session refactor #6

merged 2 commits into from
Mar 4, 2017

Conversation

mayfield
Copy link
Contributor

@mayfield mayfield commented Mar 1, 2017

This is a review seeking PR for @smtakeda (and @blandonnimrat)

It does test well for me on python3.5 and python2.7 but I would consider it as part of ongoing work to increase download concurrency safely. Please feel free to provide feedback and criticism here and/or merge if you are happy enough with the intent.

I took some liberty in renaming existing mechanisms where it helped me make sense of the various pieces falling together but I can backtrack on them if they're undesired (named access_url -> fetch).

There is more overview in the commit logs as well.

Make session management explicitly managed by callers to `access_url`.
This enables each caller to protect the session from threading issues
in a way that best matches their particular model.  For Snowflakerestful
this is managed as a pool of sessions;  For chunk_downloader I will
implement something similar that provides for good reuse and cleanup
of each session while achieving as much connection pooling as can be
done safely.
Fairly large refactor in the spirit of consolidating HTTP call-patterns
and configuration.

Changed `SnowflakeRestful.access_url` from a staticmethod to a regular
instance method, reborn as a `fetch`.  The name change helps distinguish
the type of activity being performed, but is otherwise cosmetic.

Requests session pooling is manged in one place by the SnowflakeRestful
instance regardless of the type of HTTP activity.  This provides the best
session reuse possible with consistent thread safety characteristics.
Note that the `requests.Session` objects are created outside their final
execution thread, but they are never used by more than one request thread
at a time.

Other changes:
 - Moved network.py logger instance to module level.
 - DRY optimizations for _get_request, _post_request, etc. Namely
   consolidation of proxy and timeout settings.
 - Default `fetch`'s keyword arg `token` to `network.NOTOKEN`.
     I believe this is semantically acceptable (or desirable) but
     should be reviewed.
 - Renamed `chunk_downloader`'s `_get_request` to `_fetch_chunk` to
   more aptly associate it with `SnowflakeRestful.fetch()`.
 - Renamed `request_thread` function to `request_exec` as it is not
   always a thread target.
@smtakeda
Copy link
Contributor

smtakeda commented Mar 1, 2017

Thanks @mayfield! I'm OOO this week, so I'll check out by next week.

network.py Outdated
self._session = None
sessions = list(self._active_sessions)
if sessions:
self.logger.warn("Closing %d active sessions" % len(sessions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use parameterized logging, e.g., self.logger.warn("Closing %s active sessions", len(sessions))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only avoid that syntax because it only works for logging functions and I sometimes switch back and forth with other printers like print(). But I can use printf style formatting from now on.

Copy link
Contributor

@smtakeda smtakeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I'll fix logger and syntax error later.

s = requests.Session()
s.mount(u'http://', HTTPAdapter(max_retries=REQUESTS_RETRY))
s.mount(u'https://', HTTPAdapter(max_retries=REQUESTS_RETRY))
s._reuse_count = itertools.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be used later or just monitoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was using it before these commits were pushed. I'm probably going to implement session garbage collection and may use reuse_count as a mechanism for deciding when to close out old sessions. But if not I'll yank before next PR. Thanks!

request_thread_timeout = 60 # one request thread timeout

def request_thread(result_queue):
def fetch(self, method, full_url, headers, *, data=None, timeout=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes an exception in python 2.7.

Traceback (most recent call last):
  File "/home/stakeda/basic_test.py", line 13, in <module>
    import snowflake.connector
  File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/__init__.py", line 21, in <module>
    from .connection import SnowflakeConnection
  File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/connection.py", line 14, in <module>
    from . import network
  File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/network.py", line 607
    def fetch(self, method, full_url, headers, *, data=None, timeout=None, **kwargs):
                                                ^
SyntaxError: invalid syntax

Maybe just remove * for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I was pretty sure I tested that particular signature on 2.7, but I do recall that keyword only arguments handling changed in py3k. (Disclaimer: I rarely use python2)

@smtakeda smtakeda merged commit 1d323ea into snowflakedb:master Mar 4, 2017
@mayfield mayfield deleted the session_pool branch March 4, 2017 17:31
sfc-gh-aling pushed a commit that referenced this pull request Oct 17, 2022
* Thread safety refactor of http session management.

Make session management explicitly managed by callers to `access_url`.
This enables each caller to protect the session from threading issues
in a way that best matches their particular model.  For Snowflakerestful
this is managed as a pool of sessions;  For chunk_downloader I will
implement something similar that provides for good reuse and cleanup
of each session while achieving as much connection pooling as can be
done safely.

* Session consolidation

Fairly large refactor in the spirit of consolidating HTTP call-patterns
and configuration.

Changed `SnowflakeRestful.access_url` from a staticmethod to a regular
instance method, reborn as a `fetch`.  The name change helps distinguish
the type of activity being performed, but is otherwise cosmetic.

Requests session pooling is manged in one place by the SnowflakeRestful
instance regardless of the type of HTTP activity.  This provides the best
session reuse possible with consistent thread safety characteristics.
Note that the `requests.Session` objects are created outside their final
execution thread, but they are never used by more than one request thread
at a time.

Other changes:
 - Moved network.py logger instance to module level.
 - DRY optimizations for _get_request, _post_request, etc. Namely
   consolidation of proxy and timeout settings.
 - Default `fetch`'s keyword arg `token` to `network.NOTOKEN`.
     I believe this is semantically acceptable (or desirable) but
     should be reviewed.
 - Renamed `chunk_downloader`'s `_get_request` to `_fetch_chunk` to
   more aptly associate it with `SnowflakeRestful.fetch()`.
 - Renamed `request_thread` function to `request_exec` as it is not
   always a thread target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants